Data Stability in Clustering: A Closer Look

نویسنده

  • Lev Reyzin
چکیده

We consider the model introduced by Bilu and Linial [12], who study problems for which the optimal clustering does not change when distances are perturbed. They show that even when a problem is NP-hard, it is sometimes possible to obtain efficient algorithms for instances resilient to certain multiplicative perturbations, e.g. on the order of O( √ n) for max-cut clustering. Awasthi et al. [6] consider centerbased objectives, and Balcan and Liang [9] analyze the k-median and min-sum objectives, giving efficient algorithms for instances resilient to certain constant multiplicative perturbations. Here, we are motivated by the question of to what extent these assumptions can be relaxed while allowing for efficient algorithms. We show there is little room to improve these results by giving NP-hardness lower bounds for both the k-median and min-sum objectives. On the other hand, we show that multiplicative resilience parameters, even only on the order of Θ(1), can be so strong as to make the clustering problem trivial, and we exploit these assumptions to present a simple one-pass streaming algorithm for the k-median objective. We also consider a model of additive perturbations and give a correspondence between additive and multiplicative notions of stability. Our results provide a close examination of the consequences of assuming, even constant, stability in data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I'm No Longer a Child: A Closer Look at the Interaction Between Iranian EFL University Students' Identities and Their Academic Performance

Although university EFL students represent a wide array of social and cultural identities, their multiple and diverse identities are not usually considered in foreign language classrooms. This qualitative case study attempted to examine identity conflicts experienced by Iranian EFL learners at the university context. To this end, two Shiraz University students' identities were investigated. Sem...

متن کامل

A Closer Look to the Most Frequent Travelers’ Disease: A Systematic Update on Travelers’ Diarrhea

The present study, wants to highlight and review the most prevalent disease amongst travelers. In the current review, an updated review regarding epidemiology, involved pathogens, and a brief review of current evidence-based guidelines for prevention and treatment of this disease are provided. A distinguishing feature of the current review is the discussion of the impacts of irritable bowel syn...

متن کامل

Direct Marketing Based on Fuzzy Clustering of Customers (Case Study: on one Mobile Company)

Objective There is a general tendency toward direct marketing these days. Therefore, instead of designing advertisement and marketing strategies for all the customers in the market, it is recommended to classify the customers based on clustering techniques and then design specific strategies accordingly. This will reduce marketing and advertisement expenses, increase sale department efficientl...

متن کامل

A Sub-Optimal Look-Up Table Based on Fuzzy System to Enhance the Reliability of Coriolis Mass Flow Meter

Coriolis mass flow meters are one of the most accurate tools to measure the mass flow in the industry. However, two-phase mode (gas-liquid) may cause severe operating difficulties as well as decreasing certitude in measurement. This paper presents a method based on fuzzy systems to correct the error and improve the reliability of these sensors in the presence of two-phase model fluid. Definite ...

متن کامل

How Judo Professionals Win and Lost in Competition: A Closer Look at Gender, Weight, Technique, and Gripping

Background. Judo coaches and athletes must understand the relevant technical content of the competition to improve their judo skills in Taiwan. Therefore, this study intends to explore the current situation and differences in scoring techniques of outstanding judo players and the impact on the victories or defeats of scoring techniques. Objectives. The purpose of this study is to explore the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012